Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel

نویسندگان

Ying Chen

Frank Dehne

Todd Eavis

Andrew Rau-Chaplin

چکیده

The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes. We have implemented our new parallel shared-nothing data cube generation method and evaluated the impact of our novel optimized data partitioning technique. The experiements show a significant performace improvement. As a result, our new optimized parallel data cube generation method achieves close to optimal speedup for as many as 32 processors, generating a full data cube for a fact table with 16 million rows and 8 attributes in under 7 minutes. For a fact table with 256 million rows and 8 attributes, our improved method reaches optimal speedup for 32 processors, generating a full data cube consisting of ≈ 7 billion rows (200 Gigabytes) in under 88 minutes. In comparison with previous approaches, our new method does significantly improve the scalability with respect to both, the number of processors and the I/O bandwidth (number of parallel disks). Keywor ds: Data Cube, ROLAP, Parallel Computing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Large ROLAP Data Cubes in Parallel1

متن کامل

Parallel Multi-Dimensional ROLAP Indexing

This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, and numbe...

متن کامل

Parallel Multi-Dimensional RolaP Indexing1

This article addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multidimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, a...

متن کامل

RCUBE: Parallel Multi-Dimensional ROLAP Indexing

This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, an...

متن کامل

Lossless Reduction of Datacubes using Partitions

Datacubes are specially useful for answering efficiently queries on data warehouses. Nevertheless the amount of generated aggregated data is huge with respect to the initial data which is itself very large. Recent research has addressed the issue of a summary of Datacubes in order to reduce their size. The approach presented in this paper fits in a similar trend. We propose a concise representa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJDWM

دوره 2 شماره

صفحات -

تاریخ انتشار 2006

Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel

نویسندگان

چکیده

منابع مشابه

Building Large ROLAP Data Cubes in Parallel1

Parallel Multi-Dimensional ROLAP Indexing

Parallel Multi-Dimensional RolaP Indexing1

RCUBE: Parallel Multi-Dimensional ROLAP Indexing

Lossless Reduction of Datacubes using Partitions

عنوان ژورنال:

اشتراک گذاری